Skip to content

Conversation

@meenchen
Copy link
Contributor

@meenchen meenchen commented Jan 8, 2026

What does this PR do?

Type of change: Bug fix

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.66%. Comparing base (68d604d) to head (6935660).
⚠️ Report is 20 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #749   +/-   ##
=======================================
  Coverage   74.65%   74.66%           
=======================================
  Files         192      192           
  Lines       18969    18975    +6     
=======================================
+ Hits        14162    14167    +5     
- Misses       4807     4808    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meenchen meenchen self-assigned this Jan 14, 2026
@meenchen meenchen marked this pull request as ready for review January 14, 2026 17:51
@meenchen meenchen requested a review from a team as a code owner January 14, 2026 17:51
Comment on lines +183 to +188
if model_type in ["qwen3moe", "qwen3next"] and qformat == "nvfp4":
# Disable the attention projection layers to retain accuracy
quant_cfg["quant_cfg"]["model*.*attn*in_proj*"] = {"enable": False}
quant_cfg["quant_cfg"]["model*.*attn*q_proj*"] = {"enable": False}
quant_cfg["quant_cfg"]["model*.*attn*k_proj*"] = {"enable": False}
quant_cfg["quant_cfg"]["model*.*attn*v_proj*"] = {"enable": False}
Copy link
Contributor

@realAsma realAsma Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an option to skip this setting? We are hardcoding skipping of attention here.

Cc @shengliangxu - config system and model based config examples could be helpful to improve the overall experience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Right now feel free to add an additional flag for auto quant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor this part once the config system is ready.

@realAsma realAsma requested a review from shengliangxu January 14, 2026 18:30
@meenchen meenchen merged commit 6038451 into main Jan 14, 2026
35 checks passed
@meenchen meenchen deleted the weimingc/fix_ptq branch January 14, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants